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We work in a general framework where the state of a physical system is defined by its behaviour 
under measurement and the global state is constrained by no-signalling conditions. We show that 
<-h , the marginals of symmetric states in such theories can be approximated by convex combinations 

of independent and identical conditional probability distributions, generalizing the classical finite 
de Finetti theorem of Diaconis and Freedman. Our results apply to correlations obtained from 
quantum states even when there is no bound on the local dimension, so that known quantum 
de Finetti theorems cannot be used. 
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I. INTRODUCTION 



Given a bowl containing n colored balls, we wish to compare two ways of obtaining a random sample of k < n balls: 
(i) we randomly choose a ball, replace it with a ball of the same color, and repeat this step k times; (ii) we do the 

, ^ same but don't replace the balls. If k <C n, then the probability of obtaining a particular set of k balls will be almost 
the same in both cases 0. This observation has profound consequences for Bayesian statistical inference, as we now 

04 • describe. 

Suppose we perform an experiment k times in order to estimate some physical quantity, e.g., the probability A that 
. a muon decays in a given time. Let Ai = 1 if the ith muon decayed and X\ = if it did not. If we assume that the 
results of the experiments are independent, we can posit some prior probability distribution m(A) and analyze our 
data by updating this probability distribution as more data arrives. Statisticians of de Finetti's subjective school 
are not willing to accept this assumption, however, since for them all probability distributions should be subjective 
degrees of belief, which m(A) is not. Instead, they make the weaker assumptions that the experiment could have 
been performed n k times and that there was nothing special about the experiments actually performed. These 
assumptions, together with the observation about colored balls above, can be shown to imply that there exists a 
distribution 771(A) such that 



P[A x ,.,.,A k ] » J dm(X)P x [A 1 ]---P x [A k ], (1) 

i.e., the probability distribution P[A k ] behaves as if the experiments really were independent and there really were 
some objective prior m(A). This is a statement of the famous de Finetti representation theorem [Q, |!|. Our results 
establish the same correspondence for measurement results in a more general, probabilistic, physical theory, where 
the state of a system is described by a conditional probability distribution. 

We now give a brief description of the setting and our results; precise definitions are given later on. A physical 
system in a probabilistic physical theory is made up of a number of — in our case identical — subsystems, called particles. 
On each particle different measurements from a set X can be performed and outputs from a set A are obtained. The 
state of a particle is specified by a conditional probability distribution P[A\X]: the probability of obtaining result a 
when performing measurement x is given by P[A = a\X = x\. The possible states of n particles are the conditional 
probability distributions P[j4™|X™] that obey a no- signalling property, which ensures that the reduced state on a 
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subset of the particles is always well-defined. 

Our main result is that the joint state P[ J 4 fc |X fe ] = P\A\ ■ ■ ■ Ak\X\ ■ ■ -Xk] of k particles randomly chosen from n 
particles — or equivalently, the state of the first k particles of a permutation-invariant state of n particles — can be 
approximated by a convex combination of identical and independent conditional probability distributions, 

P[A k \X k ] « J dm{X)P x [A\X] xk (2) 

and that the error in the approximation is bounded by |«Y|fe(fc — l)/n in the appropriate distance measure, where 
\X\ is the number of different possible measurements. (We write p\[A|A] xfc for P^[Ax|Xi] ■ • • P^[Afc|Xfe].) Our result 
generalizes the finite de Finetti theorem of Diaconis and Freedman, who proved for classical probability distributions 
(\X\ = 1) that the error in the approximation is no more than k{k — l)/n jjj [p3| . 

This paper is motivated by recent work on finite quantum de Finetti theorems, i.e., statements of the form 

p k w J da a® fc , (3) 

where p k is the fc-particle reduced density matrix of a permutation-invariant density matrix of n particles with state 
space of dimension d, where the error is at most Ad 2 k/n in the trace distance j|, |j| p4fl . In fact, it is necessary that 
the error depends on d |^], and so the quantum de Finetti is not useful in applications where d cannot be bounded. 
Our results are designed to apply in this setting: provided we have a bound on the number of ways \X\ that a system 
is measured, the approximation in Eq. (||) will be good, even if there is no bound on the local dimension d. In recent 
years, quantum de Finetti theorems, especially Renner's so-called 'exponential' version ||, have been used to prove 
the security of quantum key distribution (QKD) schemes (7|. At the same time, attempts have been made to lift 
the assumption of a fixed (finite) local dimension Q . Since quantum de Finetti theorems are necessarily dimension- 
dependent, they cannot be used in this setting. Although our theorems do not directly lead to security proofs either, 
we regard them as a first step towards this goal. 

We also prove a finite quantum de Finetti theorem for separable p n : in this case there is an approximation of the form 
in Eq. (^) with error k(k — 1)/ti, independent of the dimension. We do not, however, know whether our techniques 
can be extended to prove the finite quantum de Finetti theorem in full generality. The issue is that our theorem 
concerns conditional probability distributions that arise from measuring quantum states and not the quantum states 
themselves. If we take, for example, a tomographically complete set of measurements, the representation described 
in Eq. (^) will in general contain distributions P\[yl|X] that cannot be obtained by performing the tomographic 
measurements on quantum states. One can, however, apply the argument of to obtain the infinite quantum de 
Finetti theorem and indeed an infinite de Finetti theorem for any physical theory in what is known as the convex sets 
framework [ll[] (sec [Q for the details). 

Another application of our work is to the study of classical channels. Fuchs, Schack and Scudo have used the 
Jamiolkowski isomorphism to transfer the infinite quantum de Finetti theorem (n = oo, k < oo) |, p| to quantum 
channels (T^|. Since a conditional probability distribution can be viewed as a classical channel with probability 
distributions as input and output, our results also provide a de Finetti theorem for classical channels. 

Outline. — Our first task is to define an appropriate distance measure on states of k particles in probabilistic theories, 
in order to quantify the error in Eq. (Q). The distance between states should bound the probability of distinguishing 
them by measurement, and so we need to be clear about what measurement strategies are allowed. One possibility, 
which we explore in [Tq] , is to restrict to strategies where each of the k particles is measured individually. But when 
the conditional probability distributions arise from making informationally complete local measurements on entangled 
quantum states, the resulting distance measure fails to bound the trace distance between the quantum states. In the 
next section we show how to define a 'good' distance measure in which all noncontextual measurements are allowed, 
including all joint quantum-mechanical measurements. We then state and prove our results. In the last section, we 
explain the origin of the distance measure, the convex sets framework, which allows us to conclude with an open 
question on finite de Finetti theorems in this more general setting. 

II. A DISTANCE MEASURE FOR CONDITIONAL PROBABILITY DISTRIBUTIONS 

When we measure a quantum system, the probability of obtaining an outcome a £ A depends on which measure- 
ment x S X we choose to perform on the system. It is usual to describe a quantum system using the formalism of 
density matrices, Hilbert spaces, and so on, but we can also describe the system by specifying a conditional probability 
distribution P[v4|X], where we write PL4|A = x] for the distribution of measurement outcome A given that measure- 
ment x is performed p5[ . While a classical system can be described using an unconditional probability distribution, 
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the same is not true for a quantum system, since measuring a quantum system disturbs it, eliminating our ability to 
make a second, incompatible, measurement on the same system. 

We are therefore motivated to describe the state of an abstract system (not necessarily obeying quantum theory) 
using a conditional probability distribution P[A|X]. We view the conditional probability distribution P[yl|X] as the 
output distribution of a measurement that has been performed on system A. Alternatively, one can view P[A|X] as 
a channel that produces an output distribution P[A|AT = x] on input x. For this reason we refer to the measurement 
setting x as the input and the measurement result a as the output. Generalizing from conditional probability distri- 
butions of one system, we shall consider a conditional probability distribution P[A"|X™] = P[A\ ■ ■ ■ A n \X\ ■ ■ ■ X n ], 
which describes an abstract system composed of n subsystems, which we call particles. 

We need to be able to describe the state of a subset X C {1, . . . ,n} of the particles. Taking the marginal of a 
conditional probability distribution PL4 n |X ra ] yields a conditional distribution PL4i|X ra ], where the outputs at the 
particles in X depends on the inputs at all n sites. In order to trace out the particles that are not in X entirely, 
rather than just the outputs obtained from measuring them, we need another notion, that of a conditional probability 
distribution being no- signalling. 

Definition 1. A conditional distribution P[A n |X n ] is no-signalling if for all subsets I C {1, . . . , n} with complements 
/ : |1 '<! / 

P[A X = ax\X x = xi] := £ P[A n = a n \X n = x n ] (4) 

is independent of Xj for all aj and all Xj- 

The terminology derives from the following fact: if we divide the n parties into two groups, X and I, then, provided 
P[A n |A™] is no-signalling, it is impossible for the group X to send a signal to the group of X just by changing 
their inputs. Not all conditional probability distributions are no-signalling; for example, P[A\ = ai,A 2 = a 2 \X\ = 
X\,X 2 = x 2 ] = [aj = x 2 ] [a 2 = x{\ (where [£] is 1 if £ is true and otherwise) is signalling. We note that any 
conditional probability distribution that arises from making local measurements on a quantum state is no-signalling. 
The no-signalling requirement is the minimal assumption necessary to ensure that state of any subset of particles is 
well-defined. 

The goal of this paper is to approximate by product distributions a no-signalling conditional probability distribution 
on k particles arising from a symmetric conditional probability distribution on n systems, so we need to introduce 
a notion of distance for conditional probability distributions. This distance measure should generalize the classical 
variational distance, which is equal to the maximum probability of distinguishing two probability distributions, and 
the quantum trace distance, which is equal to the maximal probability of distinguishing two quantum states. In order 
to define a trace distance for no-signalling conditional probability distributions we therefore need to determine what 
measurement strategies can be used to distinguish two conditional probability distributions. In fact, there are three 
natural sets of measurement strategies for conditional probability distributions, each of which induces a distance 
measure on conditional probability distributions. We will work with the largest of these sets giving the strongest 
notion of a distance, for if we can show that two conditional probability distributions are almost indistinguishable 
using a particular set of measurements, it will trivially follow that they are also almost indistinguishable when only 
a subset of those measurements is allowed. Let us start by introducing the three sets. 

An individual measurement is a distribution P[X k ] on the inputs that maps the conditional probability distribution 
to the unconditional probability distribution P[A k X k ] = P[A k \X k ]P[X k ]. Such a measurement can be carried out by 
measuring each subsystem individually. Note that individual measurements also make sense if we drop the condition 
that P[A n \X n ] is no-signalling. Since we restrict to no-signalling conditional probability distributions, a larger class 
of measurements is possible and indeed needed for applications. Suppose the conditional distribution P^i^lATiA^] 
is no-signalling. We start by writing 

P[A 1 A 2 \X 1 X 2 = x x x 2 ] = P[A 1 \X 1 X 2 = x 1 x 2 ]P[A 2 \A ll X 1 X 2 = x x x 2 ] (5) 
= P[A 1 \X 1 =x 1 )P[A 2 \A 1 ,X 1 X 2 = x 1 x 2 ], (6) 

where we made use of the no-signalling principle, Eq. (^), in the second line. This provides an operational means to 
sample from P[AiA 2 \XiX 2 — X\x 2 ): We first sample a\ from the distribution PLAjjAi = xi], then sample a 2 from 
P[^2|Ai = a\ 1 X\X 2 = x\x 2 \. The important point is that a no-signalling conditional probability distribution can 
provide the output on system 1 before specifying which input is chosen for system 2. Therefore the following adaptive 
measurement on P[Ai A2IA1X2] is possible: Input x\, obtain a\, and choose an input x 2 — f(a\), where / : A — * X is 
an arbitrary function. Such a strategy can lead to a higher probability of distinguishing two no-signalling conditional 
probability distributions, compared to individual strategies | f26| . 
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As in most of the paper we draw intuition from quantum-mechanical correlations. It is a well-established fact 
that the distinguishability of quantum states depends on whether individual or adaptive measurement strategies are 
considered. In the quantum case, furthermore, it is possible to apply a joint measurement to all k systems at once, a 
class of measurement which strictly contains adaptive measurements and can lead to strictly higher distinguishability. 
Quantum data hiding is an important application of this phenomenon Jl6| , |l7j ]. 

In defining joint operations on no-signalling conditional probability distributions, we essentially wish to allow all 
possible measurements whose outcomes behave like probability distributions. Motivated by this, we think of a no- 
signalling conditional probability distribution PL4 fe |X fe ] as a vector in a real |A'| fe -dimensional space and consider 
linear functions from this space to a real | A | -dimensional space. The set of general measurements is the set of 
linear functions M such that M(P[A k \X k ]) is a probability distribution for all no-signalling conditional probability 
distributions PL4 fe |A fe ]. Clearly, individual and adaptive strategies belong to the set of general measurements, but 
it includes strictly more strategies, too. (The assumption of linearity is necessary so that our probability behave 
reasonably when we take convex combinations of states and measurements; see Rcf. 

Definition 2. The trace distance between two no-signalling conditional probability distributions P[A fc |X fe ] and 
Q[A k \X k ] is given by 

||P[A*|jr*]-Q[i4 fc |.X' fc ]|| := sup\\M(P[A k \X k }) - M(Q[A k \X k })\\. (7) 

M 

where the supremum is taken over all general measurements and ||P[P] — ^[P]! is the classical variational distance 
for probability distributions R[B] and S[B] on system B. Extending the definition by imposing linearity, || • || is a 
norm on the space of (real) linear combinations of conditional probability distributions and hence obeys the triangle 
inequality. 

A theory in which conditional probability distributions describe the state of a particle and where joint states of 
particles obey a no-signalling distribution can be treated in the convex sets framework. The distance measure we 



introduced arises naturally in this framework. We review the convex sets framework in Section IV. This will give us 
a broader view on de Finetti theorems and will allow us to pose an open question regarding de Finetti theorems in 
the convex sets framework. 



III. OUR RESULTS 



Suppose we have a conditional probability distribution P[A n |X ra ] describing n particles. If we interchange the 
particles according to a permutation n £ S n , the resulting conditional probability distribution is 

irP[A n = ai ■ ■ ■ a n \X n =xi---x n ] 
= P[A n = a x -i(!) ■ • -a^-i^ n )\X n = 0^-1(1) • ■ -x n -i^]. 

We say that a conditional probability distribution PL4™|X n ] is symmetric if it is invariant under all permutations 
7r e S„. If \X\ = 1, this definition reduces to the usual definition of a symmetric probability distribution. We can 
now state our main result: 

Theorem 3. Suppose that P[A n \X n ] is a symmetric no-signalling conditional probability distribution. Then there 
exists a probability distribution p\ such that 

" \ n n J 

where the distribution p\ is on a finite set of single-particle conditional probability distributions, labeled by X. 

This establishes that the state of a random subset of k out of n particles is well approximated by a convex 
combination of independent and identically distributed conditional probability distributions. To prove Theorem |^, 
we first show that if PL4™|A n ] is symmetric and m is chosen to be sufficiently small, then PL4 m |A m ] is separable 
(Lemma |4|). We then establish a de Finetti theorem for separable states, Lemma |^, which will complete the proof of 
our main result, Theorem 0. We continue with Lemma 0. 
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FIG. 1: Since n = m\X\, we can divide the particles into m groups of \X\ particles. In each of these groups we measure one 
particle according to each measurement in X in advance and record a list of all the results. In the simulation, if particle i is 
supposed to be measured according to a measurement x £ X, we just look through the ith group until we come to the particle 
on which measurement x was performed in advance, and output the result we find. 



Lemma 4. Let n > \X\ and set m = \n/\X\~\. Suppose that P[A n \X n ] is a symmetric no-signalling conditional 
probability distribution. Then P[A m \X m ] is separable, i.e., there exists a probability distribution px lt ...,X m such that 

P[A m \X m ] = J2 Px 1 ,...,x m Px 1 [Ai\X 1 ]-..Py m [A m \X m ], 

Ai,...,A m 

where px u ...,x m is a probability distribution on the labels Ai, . . . , X m , where Xj labels a finite set of conditional probability 
distributions. 

Proof. In order not to obscure the main argument, we prove the statement for integral m = n/\X\ [^7f . Our technique 
can be traced to Werner |nj. We imagine the m particles to be separated in space and note that P[A m |X m ] is 
separable if and only if it can be simulated by a local hidden variable model. Such a simulation is described in Fig. 1. 
We now provide the formal proof. We construct a separable conditional distribution and then show that 

it is equal to P[j4. TO |X m ]. We assume that X = {1, 2, . . . , \X\\, define a vector y n = (yj)j=i,..., n with coordinates 
Hj = (j — 1 mod \X\) + 1, and define the separable state 

Q[A m \X m ] = J2 36" QbM [AilXt] ■ ■ ■ Q b „ >m [A m \X m ] , 

where b n £ A n is distributed according to = P[A n = b n \X n = y n ] and the single-particle conditional probability 
distributions are deterministic and defined by Qb n ,i[Ai — ai\Xi = Xi] = [ai = bti-i\\x\+ Xi ], where [t] = 1 if t is true 
and otherwise. Let £ = {1,2,.. .,n}, L x = {{i-l)\X\+x % : i = 1, 2, . . . , m} and £ 2 = AA- Further let A c = A n , 
A Cl = (A X1 ,A\ X \ +X2 , . . .,^4( TO _i)|^| +Xm ) and A C2 = A c \A Cl and define b c ,b Cl and b C2 similarly. We find 

Q[A m = a m \X m = x m ] 

= Y, p [A n = b n \X n = V n ][ai = b Xl ] ■~[a m = b (m - 1)lx \ +Xm } 

= Y J p i ACl = a m ,A C2 = b C2 \X Cl = x m ,X C2 = y C2 } 

= P[A Cl = a m \X Cl = x m ] = P[A m = a m \X m = x m ], 

where we started with the definition of QL4 m |X m ], split the summation over C\ and £2, dropped the conditioning 
over X C2 — y C2 because of the no-signalling property of P, used the definition of a marginal state, and, lastly, the 
permutation-invariance of P. □ 

Our next statement is a de Finetti theorem for symmetric separable conditional probability distributions. 

Lemma 5. Suppose that P[A m \X m ] is a symmetric separable conditional probability distribution. Then there exists 
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probability distribution p\ such that 



\P[A k \X k ]-Y,PxPxlA\X} Xk \\ <min(^M^M^J)\ (g) 

I A * i \ ill III / * 



A x ' 

where p\ is a probability distribution on a finite set of conditional probability distributions, labeled by A. 

Proof. Let . . . , (5b[A|X] be the extreme points of the set of conditional probability distributions of one 

system. These are the deterministic functions X i— > A, hence E = \A\\ X \. Any symmetric separable conditional 
probability distribution is a convex combination of conditional probability distributions of the form Q[^4 m |X m ] = 
Qi«-t w \ A \X] ■ ■ ■ Qi„. Hm) [A\X], where 1 < h, . . . , i m < E. Define Q[A\X] := i £™ a Qi, [A\X]. We expand 

m m 

Q{A\X]* k = E ' ' ' E M m (i hi . . .,i jk )Qi h \Ax\Xx] ■ ■ ■ Q l]k [A m \X m ], (10) 
where M m (ij 1 , . . . ,ij k ) = l/m k is the multinomial distribution. To compare this expression with Q[ J 4 fc |X fc ], write 

m m 

Q[A k \X k ] = E-E ' ' ' ■ ' ■ ■ ■ Qi ik [Am\X m ], (11) 

where H m (ij ll ■ ■■,ij k ) is the hypergeometric distribution for an urn with m balls (see Q). Then 

||QL4 fe |A fe ]-QL4|AT fc || = || ^ (#„,(»*,..., i A ) 

jij— J* 

M^Ctj-, , . . . , i/J)Q^ • ■ ■ Q,-. fc L4 m |X ro ] || 

— ^ , | i • ■ ■ ? *jfc ) — -^TO ! • • • i ijk) | 

3l,—,jh 

. (2kE k(k-l)\ 
<mm ,Js , (12) 



where we used the triangle inequality and Diaconis and Freedman's result on estimating the hypergeometric distribu- 
tion with a multinomial distribution fij]. □ 

These two lemmas enable the proof of Theorem ||. 

Proof of Theorem Set m = |~n/|Af|] and apply Lemma || Then apply Lemma ||. □ 

Our final result is an application to quantum theory. In complete analogy to Lemma ^ we show that the A;-particle 
reduced state of a every separable symmetric density operator on m copies of C d is approximated by a convex 
combination of tensor product states. Importantly, the approximation guarantee is independent of the dimension d, 
in contrast to the case of entangled states where a dependence on the dimension is necessary The norm is given 
by the trace norm ||A||i = TrV A^A for operators A on C d . It induces a distance measure on the set of quantum 
states that has a similar interpretation as a measure of distinguishability as the variational distance for probability 
distributions and the trace distance introduced on conditional probability distributions. 

Theorem 6. If p is a separable permutation-invariant density operator on (C d )® n , then there is a measure m(a) on 
states a on C d such that 

\\p k - [ dm(*)a® k l <2 fc(fc ~ 1} . (13) 
n j in n 

Proof. Any symmetric separable state is a convex combination of states of the form u n — r^-im ® • • • <E ) T 7r -i( n - ) , 

where {rj}™ =1 is a set of pure states (these are extreme points in B(C d )). Define r := ^ Yjj=i T i- We expand 

n n 

T ® k = E • • • E M »c?i' ■ • • . ^ ® • • • ® r ^> ( 14 ) 

31=1 j k = l 
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where M n (j\, . . . ,jk) — l/n k is the multinomial distribution. To compare this expression with ui k :— Tr ra _fcu; n , write 



uJ k 



' = X] ' " " H nUl, ■ ■ -,h)Tji ® ••• ® T jk , (15) 



where H n (ji, ... ,jf~) is the hypergeometric distribution for an urn with n balls (see 0). Then 

;*_ T »*|| l= || £ {//„(.„ h) 



II 1 I 

jl,—,jk 



< X | fl »0'i.---.ifc)-A f T.C3'ii---.ifc)| 



< k ~^, (16) 

71 

where we used the triangle inequality and Diaconis and Freedman's result on estimating the hypergeometric distribu- 
tion with a multinomial distribution JlJ]. □ 

IV. TOWARDS A FINITE DE FINETTI THEOREM FOR THE CONVEX SETS FRAMEWORK 

We will start this section with a self-contained introduction to the convex sets framework. (See Refs. |lCj and |ll[| 
for a gentler introduction.) We will then generalise Lemma || to this setting. Finally, we pose the question of the 
existence of a finite de Finetti theorem in the convex sets framework. 

Let f2 be the set of states of a particle. We assume that is convex, compact, and has affine dimension n. In 
probability theory, for example, f2 is the simplex of probability distributions (uji, . . . , cl>„+i), oji > 0, 0Ji — 1, while 
in quantum theory, f2 is (isomorphic to) the set of positive operators d with trace one on a Hilbert space 7i = C d . We 
are particularly interested in the case where f2 is specified by a set of conditional probability distributions {P\L4|X]}, 
whose elements are indexed by a label A. This is partly because quantum states can be described in this way. For 
instance, the state p of a qubit, a spin-^ system, is uniquely determined by the probabilities of obtaining spin up or 
down when it is measured along the x, y, or z axes of the Bloch sphere. Thus a qubit can be described by a conditional 
probability distribution P[A|X] with A = {til} arL d % = { x >y, z }- Not all conditional probability distributions can 
be obtained by making local measurements on quantum states. This led Barrett to define generalized theories [ fL8| , 
where the state space fi is the set of all conditional probability distributions {Pa[^4|^]}, denoted □. This is the case 
that we considered in the previous parts of the paper. When \X\ — 1, this reduces to classical probability theory. 
In quantum theory, \X\ = 1 corresponds to the case where all measurements on a system commute, and thus can be 
performed at once. In fact, every O can be mapped to a convex subset of □ for some number of fiducial measurements 
and outcomes JlO, Lemma 1]. 

In quantum theory, the most general measurement that can be performed is a positive operator-valued measure 
(POVM), whose elements are termed effects. Effects are linear functions mapping states to probabilities: in (finite- 
dimensional) quantum theory, the probability of obtaining the outcome associated with an effect r, when the state 
is u>, is r(u>) = Tr (Ru>) for some bounded nonnegative operator R with R < 1. In a generalized theory, effects are 
also functions mapping states to probabilities, and these functions should be affine so that they are compatible with 
preparing convex combinations. The vector space of affine functions a : £1 — * R, denoted A(f2), is isomorphic to 
The cone of nonnegative affine functions on O is denoted A + (fl). The order unit of A(iY) is the element e € A(Q) 
satisfying e(oj) = 1 for all well. An effect is an element a € A({Y) satisfying < a(ui) < 1 for all uj E ft. The set of 
all effects is denoted [0, e]. There is a natural embedding of into A(Q)*, the dual space of A(Q), given by ui i— > w, 
where tli(a) = a(u>) for all a S A(fl). Furthermore, if Cj S A(Q,)* satisfies u>(a) > for all a E A + (il) and tu(e) = 1, 



then Cj is the image of some state ugO 20, Section 2.6]. We identify Cj with u in what follows. It is easy to check 
that || • || = sup ag [ e ] l a (')l i s a norm on A(il)* . For more details about the convex sets framework, see [[To], [llf . 

A natural distance measure on the set of states, which generalises the variational distance between classical proba- 
bility distributions and the trace distance between quantum states, is given by 

— uj || = sup \a(oj) — a(u) )\. (17) 

o£[0,e] 

In quantum theory, systems are combined by taking the tensor product of the Hilbert spaces for each system. The 
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same is true in the convex sets framework: u> ® u>' is denned to be the product state where system f2 is in state w, 
system f2' is in state to', and the two systems are independent. The complication is that the space A(il)* is a Banach 
space but not a Hilbert space and there are multiple ways to define a norm on the tensor product space, consistent 
with the norm on A(tt)* . This choice affects the set of pure (i.e., norm 1) states of the joint system. At the very 
least, we want the set of joint states to be closed under convex combinations. This yields: 

Definition 7. The minimal tensor product of fl and f2', denoted by f2 ® mm consists of all convex combinations of 
product states u> ® a/, uo £ ft and u' 6 !)'. 

We say that states in f2<g) m j n fi' are separable, thereby extending terminology from quantum mechanics to the convex 
sets framework. Next, if a is a valid effect for system f2 and a' a valid effect for system fi', then a® a 1 is the effect 
defined on product states via a <g> a'(ui <£> a/) = a(ui)a' '(w 1 ). If all convex combinations of such effects are to be allowed, 
the state space must only contain states in the maximal tensor product, defined via duality as: 

Definition 8. The maximal tensor product of f2 and Q' , denoted by Q <X> max consists of all bilinear functions 
/i : A(tt) x A(fi') -> K that satisfy fi(a (g> b) > for a, b > 0, and fi(e ® e') = 1. 

Thus fi G fi <8>max f2' can be written as a linear combination of product states, possibly with negative weights. 
In classical probability theory, the minimal and the maximal tensor product coincide. In general, a tensor product 
Q, (g> fi' is a convex set with (8min f2' C f2 <g> f2' C f2 <8> ma x O'. In quantum theory, f2 (g) f2' is the set of trace one 
positive operators on the (unique) Hilbert space tensor product of H and Tit '. Note that f2 <g> 17' lies strictly between 
the maximal and minimal tensor products in the quantum case. The set of separable quantum states is 17 ®min ^' 
and 17 £g> max 17' is the set of trace one entanglement witnesses. 

For a state /i € fi® 17', we say that (iq € 17, defined by a(/xn) = a <£> for all effects a, is the partial trace of /it 
with respect to 17'. An effect on the tensor product is an element a <E A(il ® 17') satisfying < a < e <E> e'. The larger 
the set of joint states, the smaller the set of allowed effects. This means that the distance measure that we defined 
in Eq. (ft7|), when applied to states of more than one particle, depends on which tensor product we use. It is true, 
however, that \\u> — u>'\\ < — k/|| m m> the distance measure for the minimal tensor product, since in that case the 
set of effects is largest. Also note that a physical theory may place additional restrictions on which effects are allowed 
but, even then, \\lo — provides an upper bound on the probability of distinguishing oj and lo 1 . 

Theorem 9. Let n be a convex set with E extreme points (E may be infinite). Suppose u n € Q® minTl is symmetric. 
Then there is a measure m(r) on states r € 17 such that 

|K- / dm(T)T®H A <^(™t£zll) . (18) 



Proof. Let t\, . . . ,te be the extreme points of 17. Any symmetric separable state is a convex combination of states of 
the form uj n = X, J2 n r v-i (1) ® • • • ® T ^-i ( „, > where 1 < i x , . . . , i n < E. Define t := i Y%=i T i 3 • We expand 



n n 



T® k = E ' ' ' E M «fe ' ■ ■ ■ ' ® • • • ® n ih , (19) 

where M n {ij 1 , . . . ,ij k ) = l/n k is the multinomial distribution. To compare this expression with uj k , write 



E ' ' ' E .•••,«& ® ' ' ' ® ^ . (2°) 



where H n (ij 1 , . . . , ij k ) is the hypergeometric distribution for an urn with n balls (see Q). Then 

-***IL» = II E •••^■J 



Vk 1 1 mm 



< E \ H n{ih^"-i i 3h)- M n{ij^--'i i ih) 



31, — ,3k 



< mm , J: L , (21) 



n 
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where we used the triangle inequality and Diaconis and Freedman's result on estimating the hypergeometric distribu- 
tion with a multinomial distribution 

Hi. □ 



One can show that D 8m "™ is precisely the set of all no-signalling conditional probability distributions and that 
□®min« i s the set of all separable conditional probability distributions Jl^, plf . Furthermore the trace distance 
(Definition ||) coincides with the definition in Eq. (|l7|). With these observations and the fact that || • || < || • || m i n 
we see that Theorem |^ generalises Lemma ^. Unfortunately, we were not able to obtain a similar generalisation of 
Lemma [| and hence of Theorem |3|. We thus conclude with the question of whether a finite de Finetti theorem exists 
for general theories in the convex sets framework. We remark that the argument of [^) applied in this context yields 
an infinite de Finetti theorem for any theory in the convex sets framework (see [O] for the details). 
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